Bioinformatics analysis of long non-coding RNA-associated competing endogenous RNA network in schizophrenia

Schizophrenia (SCZ) is a serious psychiatric condition with a 1% lifetime risk. SCZ is one of the top ten global causes of disabilities. Despite numerous attempts to understand the function of genetic factors in SCZ development, genetic components in SCZ pathophysiology remain unknown. The competing endogenous RNA (ceRNA) network has been demonstrated to be involved in the development of many kinds of diseases. The ceRNA hypothesis states that cross-talks between coding and non-coding RNAs, including long non-coding RNAs (lncRNAs), via miRNA complementary sequences known as miRNA response elements, creates a large regulatory network across the transcriptome. In the present study, we developed a lncRNA-related ceRNA network to elucidate molecular regulatory mechanisms involved in SCZ. Microarray datasets associated with brain regions (GSE53987) and lymphoblasts (LBs) derived from peripheral blood (sample set B from GSE73129) of SCZ patients and control subjects containing information about both mRNAs and lncRNAs were downloaded from the Gene Expression Omnibus database. The GSE53987 comprised 48 brain samples taken from SCZ patients (15 HPC: hippocampus, 15 BA46: Brodmann area 46, 18 STR: striatum) and 55 brain samples taken from control subjects (18 HPC, 19 BA46, 18 STR). The sample set B of GSE73129 comprised 30 LB samples (15 patients with SCZ and 15 controls). Differentially expressed mRNAs (DEmRNAs) and lncRNAs (DElncRNAs) were identified using the limma package of the R software. Using DIANA-LncBase, Human MicroRNA Disease Database (HMDD), and miRTarBase, the lncRNA- associated ceRNA network was generated. Pathway enrichment of DEmRNAs was performed using the Enrichr tool. We developed a protein–protein interaction network of DEmRNAs and identified the top five hub genes by the use of STRING and Cytoscape, respectively. Eventually, the hub genes, DElncRNAs, and predictive miRNAs were chosen to reconstruct the subceRNA networks. Our bioinformatics analysis showed that twelve key DEmRNAs, including BDNF, VEGFA, FGF2, FOS, CD44, SOX2, NRAS, SPARC, ZFP36, FGG, ELAVL1, and STARD13, participate in the ceRNA network in SCZ. We also identified DLX6-AS1, NEAT1, MINCR, LINC01094, DLGAP1-AS1, BABAM2-AS1, PAX8-AS1, ZFHX4-AS1, XIST, and MALAT1 as key DElncRNAs regulating the genes mentioned above. Furthermore, expression of 15 DEmRNAs (e.g., ADM and HLA-DRB1) and one DElncRNA (XIST) were changed in both the brain and LB, suggesting that they could be regarded as candidates for future biomarker studies. The study indicated that ceRNAs could be research candidates for investigating SCZ molecular pathways.


Data preprocessing and DEmRNAs and DElncRNAs identification. We used Robust Multichip
Average (RMA) for background correction, and quantile normalization of all the raw data files 26 . Brain regions and LB samples were normalized separately. An interquartile range (IQR) filter (IQR across the samples on the log base two scale greater than median IQR) was used to reduce the number of evaluated genes followed by an intensity filter (a minimum of > 100 expression signals in a minimum of 25% of the arrays) aiming to remove the non-significant probe sets which are not expressed or not changing 27 . AgiMicroRna Bioconductor package was used for assessing the quality control 28 . We employed principal component analysis (PCA) to conduct a dimensional reduction analysis 29 , aiming to find similarities between each sample group using R software's ggplot2 package 30 . Besides, a heatmap was drawn to show the correlation between samples using the Pheatmap package of R 31 . Differential expression gene analysis (DEGA) was done between SCZ and normal samples using the linear models for microarray data (limma) R package 32 in Bioconductor (https:// www. bioco nduct or. org/) 33 . We used the removeBatchEffect() function from the limma package to remove batch effects. Age, gender, and race were used as covariates in GSE73129 study and race and post-mortem interval in GSE53987 study to modify www.nature.com/scientificreports/ the data for confounding variables. We did a linear model on the main factor "group" (disease vs. control) using limma's lmFit() function and included covariates in the linear model as well. The limma's eBayes () function was then used to calculate differentially expressed data from the linear fit model. We utilized the previously used approach to identify lncRNA probes 34 . We downloaded the complete list of lncRNA genes with approved HUGO Gene Nomenclature Committee (HGNC) symbols from (https:// www. genen ames. org/) 35 . Then, we compared the lncRNA gene list with our dataset gene symbols and chose the overlapped genes. Student t-test was used and we set the aberrantly expressed RNAs cut-off as: (1) a false discovery rate (adjusted P-value) < 0.001, and (2) |log2 fold change (log2FC)|> 0.69. Note that we discarded probes with no gene symbols.
lncRNA-associated ceRNA network construction. DIANA-LncBase v3 36 was used to identify the experimentally validated interactions between the miRNAs and lncRNAs. Homo Sapiens "Species", Brain or Peripheral Blood "Tissue", and high "miRNA Confidence Levels" were selected as criteria for the DIANA-LncBase query. SCZ-related miRNAs were collected from the Human microRNA Disease Database (HMDD) v3.2 database 37 . Besides, we obtained the interactions between miRNAs that were collected using the HMDD database and target mRNAs supported by strong experimental evidence from miRTarBase 38 . These mRNAs were then compared with the previously obtained mRNAs. Then, we used the duplicated mRNAs to construct the lncRNA-miRNA-mRNA network. LncRNAs, targeted mRNAs, and the interacted miRNAs were removed from the ceRNA network in the opposite expression pattern between the targeted mRNAs and lncRNAs. Cytoscape software (version 3.8.0) 39 was used to generate ceRNA regulatory network.
Pathway enrichment analysis of DEmRNAs. The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed using the Enrichr tool 40,41 for pathway enrichment to analyze the DEmRNAs that were in the ceRNA network.

Protein-protein interaction (PPI) network construction and hub genes identification. A PPI
network was constructed using online STRING database (https:// string-db. org/) 42 for predicting the interactive relationships of common DEmRNAs encoding proteins. A minimum interaction score of above 0.7 (high confidence) was needed for PPI network construction. Cytoscape software was employed for PPI network visualization and hub genes analysis. For PPI network simplification, we removed the non-interacting genes. Then, we considered the top five genes with the highest degree of connection to the others as the hub genes according to CytoHubba (version 0.1) analysis 43 from Cytoscape.
Reconstruction of the subceRNA network. We retrieved the corresponding lncRNA and miRNA from the original network of ceRNA. We used them in the reconstruction of the subceRNA network to confirm the ceRNA network of hub genes.

Results
DEmRNAs and DElncRNAs identification. Background correction, normalization, gene filtration, and batch correction were performed prior to DEGA. We applied AgiMicroRna Bioconductor package for quality control assessment. Box plots of gene expression data were depicted to evaluate data distribution following normalization ( Supplementary Fig. S1). In the box plots, separate arrays had similar medians of expression level, confirming that correction was carried out properly. We checked sample correlation at the probe level. The samples were divided into two main clusters, one of which had SCZ, and the other consisted of the control samples. Moreover, a PCA plot was used to demonstrate the spatial distribution of samples ( Fig. 1). PCA presents information about the structure of the analyzed data. It helps find similarities between samples. One of the STR samples in the SCZ group was removed due to being spatially far from other SCZ samples. www.nature.com/scientificreports/  www.nature.com/scientificreports/ lncRNA-associated ceRNA network construction. The DIANA-LncBase v3 online tool was used to anticipate DElncRNA-miRNA interaction pairings. DELncRNAs, targeted DEmRNAs, and also the interacted miRNAs were deleted from the ceRNA network in the opposite expression pattern present between DElncR-NAs and the targeted DEmRNAs. Following that, miRTarBase was utilized to identify interactions between target mRNAs and SCZ-associated miRNAs (discovered by the HMDD). We developed a regulatory ceRNA network based on DElncRNA-miRNA-DEmRNA interactions to explore the potential mechanism behind SCZ pathophysiology ( Fig. 3 and Table 2). CeRNA network contained 50 nodes (seven lncRNAs, 17 miRNAs, and 26 mRNAs) and 95 edges in HPC, 54 nodes (five lncRNAs, 19 miRNAs, and 30 mRNAs) and 101 edges in BA46, 54 nodes (seven lncRNAs, 19 miRNAs, and 28 mRNAs) and 97 edges in STR, and three nodes (one lncRNA, one miRNA, and one mRNA) and two edges in LB. There was no common ceRNA axis between LB and brain regions. We identified one ceRNA axis in LB (MALAT1/hsa-miR-9-5p/STARD13). Among the list, 90 different ceRNA axes were common between all three areas of the brain. These ceRNA axes are bolded in Table 1.

Discussion
The expression patterns of ceRNA differ depending on tissue-related, cellular, and subcellular circumstances. A network might consist of different types of ceRNAs, such as lncRNAs, circRNAs, pseudogenes, as well as mRNAs 45 . LncRNAs, one of the major forms of RNAs, are identified in the ceRNA machinery and perform a significant role in physiological and pathological cellular mechanisms. It has been revealed that they are implicated in a variety of mental disorders (e.g., AD, SCZ, and autism spectrum disorder (ASD)) 46 . It is now well accepted that lncRNA expression varies by tissue, cellular types, and developmental levels. This tissue specificity, besides subcellular distributions, clearly indicates that lncRNA expression is tightly regulated 47 . According to the as-mentioned theoretical concepts, the ceRNA regulation network related to lncRNAs can play a crucial role in SCZ pathogenesis. In this study, we used a public database to download the expression profiles of four different types of SCZ tissues to evaluate the DElncRNAs and DEmRNAs in SCZ and normal tissues and then create a lncRNA-miRNA-mRNA regulatory network. We identified possible lncRNA-miRNA-mRNA axes in the pathogenesis of SCZ consisting of nine key lncRNAs, 17 key miRNAs, 11 key mRNAs in the brain, and one key lncRNA, one key miRNA, one key mRNA in LB. LncRNA expression has been hypothesized to be dysregulated in SCZ and might be involved in the pathogenesis 48 . Among the key lncRNAs identified by our analysis, only the association of NEAT1, MALAT1, and DLX6-AS1 with SCZ has been the specific focus of previous studies. NEAT1 is a member of the lncRNA family highly enriched in the mammalian brain 49 . It is a crucial scaffolding unit as well as a structural determinant of paraspeckles 50 , which is a subnuclear structure implicated in several cellular processes such as splicing and transcriptional modulation through chromatin structure modification 49 . This type of lncRNA has already been found to be upregulated in other types of mental illnesses, namely Huntington's disease 51 and ASD 52 . Studies related to the comparison of NEAT1 expression in the patients with SCZ against the control group have shown contradictory findings. Some studies identified downregulation of NEAT1 in peripheral blood 53 and multiple subareas of the cerebral cortex such as BA46 and HPC 54 of SCZ patients compared with healthy controls. However, others revealed no significant difference in peripheral blood 18 , or upregulation in BA46, HPC, and STR regions 7 , which is similar to our results. Further studies are needed to explain these contradictory findings. MALAT1 is predominantly expressed in nerve cells and is transcriptionally elevated in nuclear speckles. It has been reported that MALAT1 employs splicing proteins of the Ser/Arg (SR) family in transcription sites in order to regulate synaptogenesis-related gene expression in hippocampal neurons cultivated. Furthermore, in this system, knocking down MALAT1 results in a reduction in synaptic density, while overexpression promotes the density 55 . A study recently has shown no substantial variation in the expression level of MALAT1 in SCZ patients' peripheral blood compared with healthy control 56 . This finding is inconsistent with our results which might be attributed to the treatment of the patients with clozapine. The upregulation of DLX6-AS1, a regulator of GABAergic interneuron development 57 , in cerebral organoids derived from induced pluripotent stem cells showed that it might play a key role in ASD and SCZ pathogenesis 58 . In addition, we found XIST as another key lncRNA in the ceRNA network, which is associated with mental disorders. LncRNA XIST regulates the inactivation of the X chromosome, which is a critical developmental mechanism in the brain. XIST is highly expressed in lymphoblastoid cells from female patients diagnosed with bipolar disorder or recurrent severe depression as well as postmortem human brains of female psychiatric patients. Our results are in line with these findings. To our knowledge, the association of the other six key lncRNAs (MINCR, LINC01094, DLGAP1-AS1, BABAM2-AS1, PAX8-AS1, and ZFHX4-AS1) has not been the specific focus of investigations in the context of mental disorders. MINCR is a Myc-induced lncRNA related to some cancers via acting as a ceRNA 59 . Similarly, recent studies report that LINC01094 60 , DLGAP1-AS1 61 , and PAX8-AS1 62 can act as a ceRNA in some cancers. Besides, it has been shown that BABAM2-AS1 (also known as BRE-AS1) 63 and ZFHX4-AS1 64 regulate cancer cells through multiple mechanisms. The association between some of the lncRNAs with SCZ has been reported in this study for the first time; therefore, further studies are needed to validate our findings.
In the present study, KEGG enrichment analysis was performed as well. The results showed that most of the DEmRNAs involve in viral infection and three different signaling pathways. A previous study found a higher prevalence of human herpesvirus 8 (HHV8) infection in the patients with SCZ against the control group 70 . HHV8 has been associated with the development of Kaposi sarcoma 71 . Previously, a meta-analysis demonstrated that people with SCZ are more likely to have infectious hepatitis, which could be attributed to behavioral, immunological parameters, or both 72 . It is also shown that expression levels of mitogen-activated protein kinase (MAPK) 73 and forkhead box O (FOXO) pathway-related genes 74 are altered in SCZ. Association between relaxin polymorphisms and SCZ also suggests that the relaxin pathway plays a role in the development of SCZ 75 . Dysfunction of these signaling pathways might lead to functional pathology in SCZ. www.nature.com/scientificreports/ We found that 15 mRNAs (ADM, CLEC2B, F13A1, HLA-DQB1, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, CSF2RB, RHPN2, DDX3Y, EIF1AY, KDM5D, RPS4Y1, USP9Y) and one lncRNA (XIST) were differentially expressed in the LB and brain of patients with SCZ using a Venn diagram, suggesting that these RNAs could be regarded as a biomarker for SCZ. Among them, ADM 76 and HLA-DRB1 77 have been suggested as biomarkers in previous studies.
Twelve key mRNAs (BDNF, VEGFA, FGF2, FOS, CD44, SOX2, NRAS, SPARC , ZFP36, FGG, ELAVL1, and STARD13) were reported in this study. BDNF is capable of regulating neuroplasticity, inhibiting the cascade of apoptosis, and raising the expression of many cellular proteins essential for neurogenesis, neuronal proliferation, as well as survival 78 . A myriad of studies have shown that low BDNF levels are associated with poor cognitive skills in SCZ patients 79 , which is consistent with our result. The VEGFA molecule is necessary for angiogenesis, brain plasticity, and neurogenesis, and it differs from other neurotrophic factors due to its significant angiogenic role. VEGFA can enhance the proliferation of neural stem cells, endothelial cells, mature astrocytes, and neurons. VEGF signaling is changed in SCZ, according to the results from transcriptomic, epigenomic, and genetic variation studies in hippocampus and prefrontal cortex samples 80 . In addition, FGF2 is involved in angiogenesis. FGF2 was later confirmed to be a neurotrophic factor, significant in neural development and central nervous system maintenance. Serum levels of FGF2 protein were found to be substantially higher in patients with firstepisode, drug-free SCZ 81 . As one of the immediate early genes most studied in the brain, FOS is believed to play a significant role in SCZ pathophysiology. In line with our findings, FOS is upregulated in some areas of an SCZ rat model brain 82 . Contrarily, some research found that FOS is highly expressed in non-neural peripheral specimens in addition to the reduction of the FOS in the brain of SCZ patients 83 . More research is required to understand these conflicting findings. Also, contradictory results have been obtained from the study of CD44 [84][85][86] and SPARC 87 dysregulation components of the brain extracellular matrix, in some brain regions. Another key gene in our study was SOX2. The association of this gene, which is a promoter of neurogenesis, with SCZ has been investigated in some animal studies 88,89 . The NRAS-encoded protein, a Ras superfamily intrinsic GTPase, is commonly associated with cancer progression and advancement. It is also essential in neurodevelopmental disorders due to its involvement in signal transduction from activated receptors toward the MAPK cascade. A previous study reported higher expression of this gene in the frontal cortex of SCZ patients 90 . This data is in agreement with our results. The genetic network of hemostatic procedure demonstrated that FGG, as a coagulation-related mediator gene, may have a role in SCZ 91 . ELAVL1 plays a crucial role in mRNA stabilization (mediated by AU-rich elements) and translation as well as neuronal mRNA post-transcriptional regulation. The capability of ELAVL1 in stabilizing mRNA targets may be influenced by posttranslational modifications. It is postulated that an inefficient regulation of MeCP2 transcription in the SZ cohort is caused by alterations in one of its interactant targets, ELAVL1 92 . To the best of our knowledge, the role of two other key genes (i.e., ZFP36 and STARD13) has not been investigated in SCZ thus far. Similar to ELAVL1, ZFP36 is disease-relevant RNAbinding proteins (RBPs) that interact with AU-rich sequences, but unlike ELAVL1, ZFP36 mediates the decay of target mRNAs 93 . In addition, its association with mental conditions (e.g., AD 94 and suicidal behavior 95 ) has been reported previously. Moreover, STARD13, a GTPase-activating protein (GAP), inhibits RhoA and Cdc42 specifically in glioblastoma cells and other types of tumor models 96 .
There are some limitations in the present study. First, variations in methodology and the preparation of samples, the platform, patient characteristics, and data processing may all have an effect on gene expression profiles [97][98][99] . Secondly, it is challenging to explain dynamic variations in disease progression and the emergence of complications using postmortem brains 13 . Finally, a validation analysis is needed to substantiate the reliability of our findings.